Using Linear Interpolation and Weighted Reordering Hypotheses in the Moses System
نویسندگان
چکیده
This paper proposes to introduce a novel reordering model in the open-source Moses toolkit. The main idea is to provide weighted reordering hypotheses to the SMT decoder. These hypotheses are built using a first-step Ngram-based SMT translation from a source language into a third representation that is called reordered source language. Each hypothesis has its own weight provided by the Ngram-based decoder. This proposed reordering technique offers a better and more efficient translation when compared to both the distance-based and the lexicalized reordering. In addition to this reordering approach, this paper describes a domain adaptation technique which is based on a linear combination of an specific indomain and an extra out-domain translation models. Results for both approaches are reported in the Arabic-to-English 2008 IWSLT task. When implementing the weighted reordering hypotheses and the domain adaptation technique in the final translation system, translation results reach improvements up to 2.5 BLEU compared to a standard state-of-the-art Moses baseline system.
منابع مشابه
Improvements in dynamic programming beam search for phrase-based statistical machine translation
Search is a central component of any statistical machine translation system. We describe the search for phrase-based SMT in detail and show its importance for achieving good translation quality. We introduce an explicit distinction between reordering and lexical hypotheses and organize the pruning accordingly. We show that for the large Chinese-English NIST task already a small number of lexica...
متن کاملProposed Pilot Pattern Methods for Improvement DVB-T System Performance
Recently, orthogonal frequency division multiplexing (OFDM) has been extensively used in communications systems to resist channel impairments in frequency selective channels. OFDM is a multicarrier transmission technology in wireless environment that use a large number of orthogonal subcarriers to transmit information. OFDM is one of the most important blocks in digital video broadcast-terrestr...
متن کاملThe Operation Sequence Model - Combining N-Gram-Based and Phrase-Based Statistical Machine Translation
In this article, we present a novel machine translation model, the Operation Sequence Model (OSM), that combines the benefits of phrase-based and N-gram-based SMT and remedies their drawbacks. The model represents the translation process as a linear sequence of operations. The sequence includes not only translation operations but also reordering operations. As in Ngram-based SMT, the model is: ...
متن کاملWord-Order Issues in English-to-Urdu Statistical Machine Translation
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experi...
متن کاملComputing multiple weighted reordering hypotheses for a statistical machine translation phrase-based systems
Reordering is one source of error in statistical machine translation (SMT). This paper extends the study of the statistical machine reordering (SMR) approach, which uses the powerful techniques of the SMT systems to solve reordering problems. Here, the novelties yield in: (1) using the SMR approach in a SMT phrase-based system, (2) adding a feature function in the SMR step, and (3) analyzing th...
متن کامل